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VECTORS FOR EXPRESSION OF HML-2 POLYPEPTIDES 

AU publications and patent applications mentioned in this specification are incorporated 
herein by reference to the same extent as if each individual document were specifically and 

individually indicated to be incorporated by reference. 

(- 

TECHNICAL FIELD 

The present invention relates to nucleic acid vectors for polypeptide expression. 

BACKGROUND ART 

Prostate cancer is the most common type of cancer in men in the USA. Benign prostatic 
hyperplasia (BPH) is the abnormal growth of benign prostate cells in which the prostate grows 
and pushes against the urethra and bladder, blocking the normal flow of urine. More than half of 
the men in the USA aged 60-70 and as many as 90% percent aged 70-90 have symptoms of 
BPH. Although BPH is seldom a threat to life, it may require treatment to relieve symptoms. 

References 1 and 2 disclose that human endogenous retroviruses (HERVs) of the HML-2 
subgroup of the HERV-K family show up-regulated expression in prostate tumors. This finding 
is disclosed as being useful in prostate cancer screening, diagnosis and therapy. In particular, 
higher levels of an HML-2 expression product relative to normal tissue are said to indicate that 
the patient from whom the sample was taken has cancer. 

Reference 3 discloses that a specific member of the HML-2 family located in chromosome 
22 at 20.428 megabases (22qll.2) is preferentially and significantly up-regulated in prostate 
tumors. This endogenous retrovirus (termed TCAV) has several features not found in other 
members of the HERV-K family: (1) it has a specific nucleotide sequence which distinguishes it 
from other HERVs within the genome; (2) it has tandem 5' LTRs; (3) it has a fragmented 3' LTR; 
(4) its env gene is interrupted by an alu insertion; and (5) its gag contains a unique insertion. 
Reference 3 teaches that these features can be exploited in prostate cancer screening, diagnosis 
and therapy. 

References 1 to 3 disclose in general terms vectors for expression of HML-2 and PCAV 
polypeptides. It is an object of the invention to provide additional and improved vectors for in 
vitro or in vivo expression of HML-2 and PCAV polypeptides. 
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DISCLOSURE OF THE INVENTION 

The invention provides a nucleic acid vector comprising: (i) a promoter; (ii) a sequence 
encoding a HML-2 polypeptide operably linked to said promoter; and (iii) a selectable marker. 
Preferred vectors further comprise (iv) an origin of replication; and (v) a transcription terminator 
5 downstream of and operably linked to (ii). 

Vectors of the invention are particularly useful for expression of HML-2 polypeptides 
either in vitro {e.g. for later purification) or in vivo (e.g. for nucleic acid immunization). For use 
in nucleic acid immunization it is preferred that (i) & (v) should be eukaryotic and (iii) and (iv) 
should be prokaryotic. 

10 THE PROMOTER 

Vectors of the invention include a promoter. It is preferred that the promoter is functional 
in (te. can drive transcription in) a eukaryote. The eukaryote is preferably a mammal and more 
preferably a human. The promoter is preferably active in vivo. 

The promoter may be a constitutive promoter or it may be a regulated promoter. 
15 The promoter may be specific to particular tissues or cell types, or it may be active in many 

tissues. 

Preferred promoters are viral promoters e.g. from cytomegalovirus (CMV). Where 
viral-based systems are used for delivery, the promoter can be a promoter associated with the 
respective virus e.g. a vaccinia promoter can be used with a vaccinia virus delivery system, etc. 

20 The vector may also include transcriptional regulatory sequences (e.g. enhancers) in 

addition to the promoter and which interact functionally with the promoter. 

Preferred vectors include the immediate-early CMV enhancer/promoter, and more 
preferred vectors also include CMV intron A. This was originally isolated from the Towne strain 
and is very strong. The complete native human immediate-early CMV transcription control unit 

25 is divided schematically into four regions from 5' to the ATG of the sequence whose 
transcription is controlled: I - modulator region (clusters of nuclear factor 1 binding sites); II - 
enhancers region; III - promoter region; and IV - 5' UTR with intron A. In the native virus, 
Region I includes upstream sequences that modulate expression in specific cell types and clusters 
of nuclear factor 1 (NF1) binding sites. Region I can be inhibitory in many cell lines and is 

30 generally omitted from vectors of the invention. Regions II and III are generally included in 
vectors of the invention. Intron A (in Region IV) positively regulates expression in many 
transformed cell lines and its inclusion enhances expression. 
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The promoter in vectors of the invention is operably linked to a downstream sequence 
encoding a HML-2 polypeptide, such that expression of the encoding sequence is under the 
promoter's control. 

THE SEQUENCE ENCODING A HML-2 POLYPEPTWE 
5 Vectors of the invention include a sequence which encodes a HML-2 polypeptide. The 

HML-2 is preferably PCAV. 

HML-2 is a subgroup of the HERV-K family [4]. HERV isolates which are members of the 
HML-2 subgroup include HML-2.HOM [5] (also called ERVK6), HERV-K10 [6,7], 
HERV-K 108 [8], the 27 HML-2 viruses shown in Figure 4 of reference 9, HERV-K(C7) [10], 
10 HERV-K(II) [1 1], HERV-K(CH) [1,2]. Because HML-2 is a well-recognized family, the skilled 
person will be able to determine without difficulty whether any particular HERV-K is or is not a 
HML-2 e.g. by reference to the HERVd database [12]. 

It is preferred to use sequences from HML-2.HOM, located on chromosome 7 [5, 13], or 
PCAV [3]. PCAV is a member of the HERV-K sub-family HML2.0, and SEQ ID 75 is the 
15 12366bp sequence of PCAV, based on available human chromosome 22 sequence [14], from the 
beginning of its first 5' LTR to the end of its fragmented 3' LTR. It is the sense strand of the 
double-stranded genomic DNA. The transcription start site seems to be at nucleotide 635+5, and 
its poly-adenyiation site is at nucleotide 1 1735. 

The HML-2 polypeptide may be from the gag, pit, pol, env, or cORF regions. HML-2 
20 transcripts which encode these polypeptides are generated by alternative splicing of the 
full-length mRNA copy of the endogenous viral genome [e.g. Figure 4 of ref. 15, Figure 1A of 
ref. 16, Figure 9 herein]. Although some HML-2 viruses encode all five polypeptides {e.g. 
ERVK6 [5]), the coding regions of most contain mutations which result in one or more coding 
regions being either mutated or absent. Thus not all HML-2 HERVs have the ability to encode 
25 all five polypeptides. 

HML-2 gag polypeptide is encoded by the first long ORF in a complete HML-2 genome 
[17]. Full-length gag polypeptide is proteolytically cleaved. Examples of gag nucleotide 
sequences are: SEQ ID 1 (HERV-K108); SEQ ID 2 (HERV-K(C7)); SEQ ID 3 (HERV-K(II)); 
SEQ ID 4 (HERV-K 10); and SEQ ID 76 (PCAV). Examples of gag polypeptide sequences are: 
30 SEQ ID 5 (HERV-K(C7)); SEQ ID 6 (HERV-K(II)); SEQ IDs 7 & 8 (HERV-K10) ; SEQ ID 9 
('ERVK6'); SEQ ID 69; and SEQ ID 78 (PCAV). 

HML-2 prt polypeptide is encoded by the second long ORF in a complete HML-2 genome. 
It is translated as a gag-prt fusion polypeptide. The fusion polypeptide is proteolytically cleaved 
to give a protease. Examples of prt nucleotide sequences are: SEQ ID 10 [HERV-K(108)]; SEQ 
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ID 11 [HERV-K(II)]; SEQ ID 12 [HERV-K10]. Examples of prt polypeptide sequences are: 
SEQ ID 13 [HERV-K10]; SEQ ID 14 ['ERVK6']; SEQ ID 71. 

HML-2 pol polypeptide is encoded by the third long ORF in a complete HML-2 genome. It 
is translated as a gag-prt-pol fusion polypeptide. The fusion polypeptide is proteolytically 
cleaved to give three pol products — reverse transcriptase, endonuclease and integrase [18]. 
Examples of pol nucleotide sequences are: SEQ ID 15 [HERV-K(108)]; SEQ ID 16 
[HERV-K(C7)]; SEQ ID 17 [HERV-K(II)]; SEQ ID 18 [HERV-K10]. Examples of pol 
polypeptide sequences are: SEQ ID 19 [HERV-K(C7)]; SEQ ID 20 [HERV-K10]; SEQ ID 21 
['ERVK6']; SEQ ID 73. 

HML-2 env polypeptide is encoded by the fourth long ORF in a complete HML-2 genome. 
The translated polypeptide is proteolytically cleaved. Examples of env nucleotide sequences are: 
SEQ ID 22 [HERV~K(108)]; SEQ ID 23 [HERV-K(C7)]; SEQ ID 24 [HERV-K(II)]; SEQ ID 25 
[HERV-K10]. Examples of env polypeptide sequences are: SEQ ID 26 [HERV-K(C7)]; SEQ ID 
27 [HERV-K10] ; SEQ ID 28 ['ERVK6']. 

HML-2 cORF polypeptide is encoded by an ORF which shares the same 5' region and start 
codon as env. After around 87 codons, a splicing event removes env-coding sequences and the 
cORF-coding sequence continues in the reading frame +1 relative to that of env [19, 20]. cORF 
has also been called Rec [21]. Examples of cORF nucleotide sequences are: SEQ IDs 29 & 30 
[HERV-K(108)]. An example of a cORF polypeptide sequence is SEQ ID 31. 

The HML-2 polypeptide may alternatively be from a PCAP open-reading frame [22], such 
as PCAP1, PCAP2, PCAP3, PCAP4, PCAP4a or PCAP5 (SEQ IDs 32 to 37 herein). PCAP3 
(SEQ IDs 34 & 46) and PCAP5 are preferred (SEQ ID 37). 

The HML-2 polypeptide may alternatively be one of SEQ IDs 38 to 50 [22]. 

Sequences encoding any HML-2 polypeptide expression product may be used in 
accordance with the invention {e.g. sequences encoding any one of SEQ IDs 5, 6, 7, 8, 9, 13, 14, 
19, 20, 21, 26, 27, 28, 31-50, 69-74, 78 or 79). 

The invention may also utilize sequences encoding polypeptides having at least d% 
identity to such wild-type HML-2 polypeptide sequences. The value of a may be 65 or more {e.g. 
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 
92, 93, 94, 95, 96, 97, 98, 99, 99.5, 99.9). These sequences include allelic variants, SNP variants, 
homologs, orthologs, paralogs, mutants etc. of the SEQ IDs listed in the previous paragraph. 

The invention may also utilize sequences having at least b% identity to wild-type HML-2 
nucleotide sequences. The value of b may be 65 or more (e.g. 66, 67, 68, 69, 70, 71, 72, 73, 74, 
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75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
99.5, 99.9). These sequences include allelic variants, SNP variants, homologs, orthologs, 
paxalogs, mutants etc. of SEQ IDs 1, 2, 3, 4, 10, 11, 12, 15, 16, 17, 18, 22, 23, 24, 25, 29 and 30. 
The invention may also utilize sequences comprising a fragment of at least c nucleotides of 
5 such wild-type HML-2 nucleotide sequences. The value of c may be 7 or more (e.g. 8, 9, 10, 1 1, 
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, 100, 
125, 150, 175, 200, 250, 300 or more). The fragment is preferably a proteolytic cleavage product 
of a HML-2 polyprotein. The fragment preferably comprises a sequence encoding a T-cell or, 
preferably, a B-cell epitope from HML-2. T- and B-cell epitopes can be identified empirically 

10 (e.g. using the PEPSCAN method [23, 24] or similar methods), or they can be predicted e.g. 
using the Jameson- Wolf antigenic index [25], matrix-based approaches [26], TEPITOPE [27], 
neural networks [28], OptiMer & EpiMer [29, 30], ADEPT [31], Tsites [32], hydrophilicity [33], 
antigenic index [34] or the methods disclosed in reference 35 etc. 

The invention may also utilize sequences encoding a polypeptide which comprises a 

15 fragment of at least d amino acids of wild-type HML-2 polypeptide sequences. The value of d 
may be 7 or more (e.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21,22, 23,24, 25, 30, 35, 
40, 45, 50, 60, 70, 75, 80, 90, 100, 125, 150, 175, 200, 250, 300 or more). The fragment 
preferably comprises a T-cell or, preferably, a B-cell epitope from HML-2. 

The invention may also utilize sequences comprising (i) a first sequence which is a 

20 wild-type HML-2 sequence or a sequence as disclosed above and (ii) a second non-HML-2 
sequence. Examples of (ii) include sequences encoding: signal peptides, protease cleavage sites, 
epitopes, leader sequences, tags, fusion partners, N-terminal methionine, arbitrary sequences etc. 
Sequence (ii) will generally be located at the N- and/or C-terminus of (i). 

Even though a nucleotide sequence may encode a HML-2 polypeptide which is found 

25 naturally, it may differ from the corresponding natural nucleotide sequence. For example, the 
nucleotide sequence may include mutations e.g. to take into account codon preference in a host 
of interest, or to add restriction sites or tag sequences. 

THE SELECTABLE MARKER 

Vectors of the invention include a selectable marker. 
30 The marker preferably functions in a microbial host (e.g. in a prokaryote, in a bacteria, in a 

yeast). The marker is preferably a prokaryotic selectable marker (e.g. transcribed under the 
control of a prokaryotic promoter). 

For convenience, typical markers are antibiotic resistance genes. 
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FURTHER FEATURES OF NUCLEIC ACID VECTORS OF THE INVENTION 

The vector of the invention is preferably an autonomously replicating episomal or 
extrachromosomal vector, such as a plasmid. 

The vector of the invention preferably comprises an origin of replication. It is preferred 
5 that the origin of replication is active in prokaryotes but not in eukaryotes. 

Preferred vectors thus include a prokaryotic marker for selection of the vector, a 
prokaryotic origin of replication, but a eukaryotic promoter for driving transcription of the 
HML-2 coding sequence. The vectors will therefore (a) be amplified and selected in prokaryotic 
hosts without HML-2 polypeptide expression, but (b) be expressed in eukaryotic hosts without 
10 being amplified. This is ideal for nucleic acid immunization vectors. 

The vector of the invention may comprise a eukaryotic transcriptional terminator sequence 
downstream of the HML2-coding sequence. This can enhance transcription levels. Where the 
HML2-coding sequence does not have its own, the vector of the invention preferably comprises 
a polyadenylation sequence. A preferred polyadenylation sequence is from bovine growth 
15 hormone. 

The vector of the invention may comprise a multiple cloning site 

In addition to sequences encoding a HML-2 polypeptide and a marker, the vector may 
comprise a second eukaryotic coding sequence. The vector may also comprise an IRES upstream 
of said second sequence in order to permit translation of a second eukaryotic polypeptide from 
20 the same transcript as the HML-2 polypeptide. Alternatively, the HML-2 polypeptide may be 
downstream of an IRES . 

The vector of the invention may comprise unmethylated CpG motifs e.g. unmethylated 
DNA sequences which have in common a cytosine preceding a guanosine, flanked by two 5* 
purines and two 3* pyrimidines. In their unmethylated form these DNA motifs have been 
25 demonstrated to be potent stimulators of several types of immune cell. 

PHARMACEUTICAL COMPOSITIONS 

The invention provides a pharmaceutical composition comprising a vector of the invention. 
The invention also provides the vectors' use as medicaments, and their use in the manufacture of 
medicaments for treating prostate cancer. The invention also provides a method for treating a 
30 patient with a prostate tumor, comprising administering to them a pharmaceutical composition of 
the invention. The patient is generally a human, preferably a human male, and more preferably 
an adult human male. Other diseases in which HERV-Ks have been implicated include testicular 
cancer [36], multiple sclerosis [37], and insulin-dependent diabetes mellitus (IDDM) [38], and 
the vectors may also be used against these diseases. 
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The invention also provides a method for raising an immune response, comprising 
administering an immunogenic dose of a vector of the invention to an animal (e.g. to a human). 

Pharmaceutical compositions encompassed by the present invention include as active 
agent, the vectors of the invention in a therapeutically effective amount. An "effective amount" 
5 is an amount sufficient to effect beneficial or desired results, including clinical results. An 
effective amount can be administered in one or more administrations. For purposes of this 
invention, an effective amount is an amount that is sufficient to palliate, ameliorate, stabilize, 
reverse, slow or delay the symptoms and/or progression of prostate cancer. The effect can be 
detected by, for example, chemical markers or antigen levels. Therapeutic effects also include 

1 0 reduction in physical symptoms. 

The precise effective amount for a subject will depend upon the subjects size and health, 
the nature and extent of the condition, and the therapeutics or combination of therapeutics 
selected for administration. The effective amount for a given situation is determined by routine 
experimentation and is within the judgment of the clinician. For purposes of the present 

15 invention, an effective dose will generally be from about O.Olmg/kg to about 5 mg/kg, or about 
0.01 mg/ kg to about 50 mg/kg or about 0.05 mg/kg to about 10 mg/kg of the compositions of the 
present invention in the individual to which it is administered. 

The compositions can be used to treat cancer as well as metastases of primary cancer. In 
addition, the pharmaceutical compositions can be used in conjunction with conventional methods 

20 of cancer treatment, e.g. to sensitize tumors to radiation or conventional chemotherapy. The 
terms "treatment", "treating", "treat" and the like are used herein to generally refer to obtaining a 
desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of 
completely or partially preventing a disease or symptom thereof and/or may be therapeutic in 
terms of a partial or complete stabilization or cure for a disease and/or adverse effect attributable 

25 to the disease. "Treatment" as used herein covers any treatment of a disease in a mammal, 
particularly a human, and includes: (a) preventing the disease or symptom from occurring in a 
subject which may be predisposed to the disease or symptom but has not yet been diagnosed as 
having it; (b) inhibiting the disease symptom, i.e. arresting its development; or (c) relieving the 
disease symptom, i.e. causing regression of the disease or symptom. 

30 A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The 

term "pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic 
agent, such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to 
any pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which can be administered without undue toxicity. 

35 Suitable carriers can be large, slowly metabolized macromolecules such as proteins, 
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polysaccharides, polylactic acids, poly gly colic acids, polymeric amino acids, amino acid 
copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill 
in the art. Pharmaceutically acceptable carriers in therapeutic compositions can include liquids 
such as water, saline, glycerol and ethanol. Auxiliary substances, such as wetting or emulsifying 
5 agents, pH buffering substances, and the like, can also be present in such vehicles. Typically, the 
therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; 
solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection can also be 
prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. 
Pharmaceutically acceptable salts can also be present in the pharmaceutical composition, e.g. 
10 mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and 
the salts of organic acids such as acetates, propionates, malonates, benzoates, and the like. A 
thorough discussion of pharmaceutically acceptable excipients is available in reference 39. 

The composition is preferably sterile and/or pyrogen-free. It will typically be buffered at 
about pH 7. 

15 Once formulated, the compositions contemplated by the invention can be (1) administered 

directly to the subject; or (2) delivered ex vivo, to cells derived from the subject (e.g. as in ex vivo 
gene therapy). Direct delivery of the compositions will generally be accomplished by parenteral 
injection, e.g. subcutaneously, intraperitoneally, intravenously or intramuscularly, intratumoral 
or to the interstitial space of a tissue. Other modes of administration include oral and pulmonary 

20 administration, suppositories, and transdermal applications, needles, and gene guns or 
hyposprays. Dosage treatment can be a single dose schedule or a multiple dose schedule. 

Intramuscular injection is preferred. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are 
known in the art [e.g. ref. 40]. Examples of cells useful in ex vivo applications include, for 
25 example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or 
tumor cells. Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be 
accomplished by, for example, dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the nucleic 
acid(s) in liposomes, and direct microinjection of the DNA into nuclei, all well known in the art. 

30 Targeted delivery 

Vectors of the invention may be delivered in a targeted way. 

Receptor-mediated DNA delivery techniques are described in, for example, references 41 
to 46. Therapeutic compositions containing a nucleic acid are administered in a range of about 
lOOng to about 200mg of DNA for local administration in a gene therapy protocol. 
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Concentration ranges of about 500 ng to about 50 mg, about ljig to about 2 mg, about 5^g to 
about 500|ig, and about 20jag to about lOO^g of DNA can also be used during a gene therapy 
protocol. Factors such as method of action (e.g. for enhancing or inhibiting levels of the encoded 
gene product) and efficacy of transformation and expression are considerations which will affect 
5 the dosage required for ultimate efficacy. Where greater expression is desired over a larger area 
of tissue, larger amounts of vector or the same amounts re-administered in a successive protocol 
of administrations, or several administrations to different adjacent or close tissue portions of e.g. 
a tumor site, may be required to effect a positive therapeutic outcome. In all cases, routine 
experimentation in clinical trials will determine specific ranges for optimal therapeutic effect. 
10 Vectors can be delivered using gene delivery vehicles. The gene delivery vehicle can be of 

viral or non- viral origin (see generally references 47 to 50). 

Viral-based vectors for delivery of a desired nucleic acid and expression in a desired cell 
are well known in the art. Exemplary viral-based vehicles include, but are not limited to, 
recombinant retroviruses (e.g. references 51 to 61), alphavirus-based vectors (e.g. Sindbis virus 

15 vectors, Semliki forest virus (ATCC VR-67; ATCC VR-1247), Ross River virus (ATCC VR- 
373; ATCC VR-1246) and Venezuelan equine encephalitis virus (ATCC VR-923; ATCC VR- 
1250; ATCC VR 1249; ATCC VR-532); hybrids or chimeras of these viruses may also be used), 
poxvirus vectors (e.g. vaccinia, fowlpox, canarypox, modified vaccinia Ankara, etc.), adenovirus 
vectors, and adeno-associated virus (AAV) vectors (e.g. see refs. 62 to 67). Administration of 

20 DNA linked to killed adenovirus [68] can also be employed. 

Non-viral delivery vehicles and methods can also be employed, including, but not limited 
to, polycationic condensed DNA linked or unlinked to killed adenovirus alone [e.g. 68], ligand- 
linked DNA [69], eukaryotic cell delivery vehicles cells [e.g. refs. 70 to 74] and nucleic charge 
neutralization or fusion with cell membranes. Naked DNA can also be employed. Exemplary 

25 naked DNA introduction methods are described in refs. 75 and 76. Liposomes (e.g. 
immunoliposomes) that can act as gene delivery vehicles are described in refs. 77 to 81. 
Additional approaches are described in refs. 82 & 83. 

Further non-viral delivery suitable for use includes mechanical delivery systems such as 
the approach described in ref. 83. Moreover, the coding sequence and the product of expression 
30 of such can be delivered through deposition of photopolymerized hydrogel materials or use of 
ionizing radiation [e.g. refs. 84 & 85]. Other conventional methods for gene delivery that can be 
used for delivery of the coding sequence include, for example, use of hand-held gene transfer 
particle gun [86] or use of ionizing radiation for activating transferred genes [84 & 87]. 

Delivery DNA using PLG {poly(lactide-co-glycolide)} microparticles is a particularly 
35 preferred method e.g. by adsorption to the microparticles, which are optionally treated to have a 
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negatively-charged surface {e.g. treated with SDS) or a positively-charged surface (e.g. treated 
with a cationic detergent, such as CTAB). 

Vaccine compositions . 

The pharmaceutical composition is preferably an immunogenic composition and is more 
5 preferably a vaccine composition. Such compositions can be used to raise antibodies in a 
mammal (e.g. a human) and/or to raise a cellular immune response (e.g. a response involving 
T-cells such as CTLs, a response involving natural killer cells, a response involving 
macrophages etc.) 

The invention provides the use of a vector of the invention in the manufacture of 
10 medicaments for preventing prostate cancer. The invention also provides a method for protecting 
a patient from prostate cancer, comprising administering to them a pharmaceutical composition 
of the invention. 

Nucleic acid immunization is well known [e.g. refs. 88 to 94 etc.] 

The composition may additionally comprise an adjuvant. For example, the composition 

15 may comprise one or more of the following adjuvants: (1) oil-in-water emulsion formulations 
(with or without other specific immunostimulating agents such as muramyl peptides (see below) 
or bacterial cell wall components), such as for example (a) MF59™ [95; Chapter 10 in ref. 96], 
containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing MTP-PE) 
formulated into submicron particles using a microfluidizer, (b) SAF, containing 10% Squalane, 

20 0.4% Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP either microfluidized into a 
submicron emulsion or vortexed to generate a larger particle size emulsion, and (c) Ribi™ 
adjuvant system (RAS), (Ribi Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% 
Tween 80, and one or more bacterial cell wall components from the group consisting of 
monophosphorylipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS), 

25 preferably MPL + CWS (DetoxTM); (2) saponin adjuvants, such as QS21 or StimulonTM 
(Cambridge Bioscience, Worcester, MA) may be used or particles generated therefrom such as 
ISCOMs (immunostimulating complexes), which ISCOMS may be devoid of additional 
detergent [97]; (3) Complete Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant 
(IFA); (4) cytokines, such as interleukins (e.g. EL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12 etc.), 

30 interferons (e.g. gamma interferon), macrophage colony stimulating factor (M-CSF), tumor 
necrosis factor (TNF), etc.; (5) monophosphoryl lipid A (MPL) or 3-O-deacylated MPL 
(3dMPL) [e.g. 98, 99]; (6) combinations of 3dMPL with, for example, QS21 and/or oil-in-water 
emulsions [e.g. 100, 101, 102]; (7) oligonucleotides comprising CpG motifs i.e. containing at 
least one CG dinucleotide, with 5-methylcytosine optionally being used in place of cytosine; (8) 

35 a polyoxyethylene ether or a polyoxyethylene ester [103]; (9) a polyoxyethylene sorbitan ester 
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surfactant in combination with an octoxynol [104] or a polyoxyethylene alkyl ether or ester 
surfactant in combination with at least one additional non-ionic surfactant such as an octoxynol 
[105]; (10) an immunostimulatory oligonucleotide (e.g. a CpG oligonucleotide) and a saponin 
[106]; (11) an immunostimulant and a particle of metal salt [107]; (12) a saponin and an oil-in- 

5 water emulsion [108]; (13) a saponin (e.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) [109]; 
(14) aluminium salts, preferably hydroxide or phosphate, but any other suitable salt may also be 
used (e.g. hydroxyphosphate, oxyhydroxide, orthophosphate, sulphate etc. [chapters 8 & 9 of ref. 
96]). Mixtures of different aluminium salts may also be used. The salt may take any suitable 
form (e.g. gel, crystalline, amorphous etc.); (15) chitosan; (16) cholera toxin or E.coli heat labile 

10 toxin, or detoxified mutants thereof [110]; (17) microparticles (i.e. a particle of ~100nm to 
~150jam in diameter, more preferably ~200nm to -30 jam in diameter, and most preferably 
~500nm to -10pm in diameter) formed from materials that are biodegradable and non-toxic (e.g. 
a poly(a-hydroxy acid), a polyhydroxybutyric acid, a polyorthoester, a polyanhydride, a 
polycaprolactone etc., such as poly(lactide-co-glycolide) etc.) optionally treated to have a 

15 negatively-charged surface (e.g. with SDS) or a positively-charged surface (e.g. with a cationic 
detergent, such as CTAB); (18) monophosphoryl lipid A mimics, such as aminoalkyl 
glucosaminide phosphate derivatives e.g. RC-529 [111]; (19) polyphosphazene (PCPP); (20) a 
bioadhesive [112] such as esterified hyaluronic acid microspheres [113] or a mucoadhesive 
selected from the group consisting of cross-linked derivatives of poly(acrylic acid), polyvinyl 

20 alcohol, polyvinyl pyrollidone, polysaccharides and carboxymethylcellulose; (21) double- 
stranded RNA; or (22) other substances that act as immunostimulating agents to enhance the 
efficacy of the composition. Aluminium salts and/or MF59™ are preferred. 

Vaccines of the invention may be prophylactic (i.e. to prevent disease) or therapeutic (i.e. 
to reduce or eliminate the symptoms of a disease). 

25 SPECIFIC VECTORS OF THE INVENTION 

Preferred vectors of the invention comprise: (i) a eukaryotic promoter; (ii) a sequence 
encoding a HML-2 polypeptide downstream of and operably linked to said promoter; (iii) a 
prokaryotic selectable marker; (iv) a prokaryotic origin of replication; and (v) a eukaryotic 
transcription terminator downstream of and operably linked to said sequence encoding a HML-2 

30 polypeptide. 

Particularly preferred vectors are shown in figures 2 to 8 (SEQ IDs 51 to 56 & 80). 
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VIRUS-LIKE PARTICLES 

HML-2 gag polypeptide has been found to assemble into virus-like particles (VLPs). This 
particulate form of the polypeptide has enhanced immunogenicity when compared to soluble 
polypeptide and is a preferred form of polypeptide for use in immunization and/or diagnosis. 
5 Thus the invention provides a virus-like particle, comprising HML-2 gag polypeptide. The 

gag polypeptide may be myristoylated at its N-terminus. 

The invention also provides a VLP of the invention for use as an immunogen or for use as 
a diagnostic antigen. The invention also provides the use of a VLP of the invention in the 
manufacture of a medicament for immunizing an animal. 
10 The invention also provides a method of raising an immune response in an animal, 

comprising administering to the animal a VLP of the invention. The immune response may 
comprise a humoral immune response and/or a cellular immune response. 

For raising an immune response, the VLP may be administered with or without an adjuvant , 
as disclosed above. The immune response may treat or protect against cancer {e.g. prostate 
15 cancer). 

The invention also provides a method for diagnosing cancer (e.g. prostate cancer) in a 
patient, comprising the step of contacting antibodies from the patient with VLPs of the invention. 
Similarly, the invention provides a method for diagnosing cancer (e.g. prostate cancer) in a 
patient, comprising the step of contacting anti-VLP antibodies with a patient sample. 
20 The invention also provides a process for preparing VLPs of the invention, comprising the 

step of expressing gag polypeptide in a cell, and collecting VLPs from the cell. Expression may 
be achieved using a vector of the invention. 

The VLP of the invention may or may not include packaged nucleic acid. 
The gag polypeptide from which the VLPs are made can be from any suitable HML-2 
25 virus (e.g. SEQ IDs 1-9, 69 & 78). 

DEFINITIONS 

The term "comprising" means "including" as well as "consisting" e.g. a composition 
"comprising" X may consist exclusively of X or may include something additional e.g. X + Y. 

The term "about" in relation to a numerical value x means, for example, x±l0%. 

30 The terms "neoplastic cells", "neoplasia", "tumor", "tumor cells", "cancer" and "cancer 

cells" (used interchangeably) refer to cells which exhibit relatively autonomous growth, so that 
they exhibit an aberrant growth phenotype characterized by a significant loss of control of cell 
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proliferation (i.e. de-regulated cell division). Neoplastic cells can be malignant or benign and 
include prostate cancer derived tissue. 

References to a percentage sequence identity between two nucleic acid sequences mean 
that, when aligned, that percentage of bases are the same in comparing the two sequences. This 
5 alignment and the percent homology or sequence identity can be determined using software 
programs known in the art, for example those described in section 7.7.18 of reference 114. A , 
preferred alignment program is GCG Gap (Genetics Computer Group, Wisconsin, Suite Version 
10.1), preferably using default parameters, which are as follows: open gap = 3; extend gap = 1. 
References to a percentage sequence identity between two amino acid sequences means 

10 that, when aligned, that percentage of amino acids are the same in comparing the two sequences. 
This alignment and the percent homology or sequence identity can be determined using software 
programs known in the art, for example those described in section 7.7.18 of reference 114. A 
preferred alignment is determined by the Smith- Waterman homology search algorithm using an 
affine gap search with a gap open penalty of 12 and a gap extension penalty of 2, BLOSUM 

1 5 matrix of 62. The Smith- Waterman homology search algorithm is taught in reference 115. 

BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 shows the pCMVkm2 vector, and Figures 2 to 8 show vectors formed by inserting 
sequences encoding HML-2 polypeptides into this vector. 

Figure 9 shows the location of coding sequences in the HML2.HOM genome, with 
20 nucleotide numbering according to ref. 5. 

Figure 10 is a western blot showing gag expression in transfected 293 cells. Lanes 1 to 4 
are: (1) gag opt HML-2; (2) gag opt PCAV; (3) gag wt PCAV; (4) mock. 

Figure 11 also shows western blots of transfected 293 cells. In Figure 11A the staining 
antibody was anti-HML-2, but in Figure 1 IB it was anti-PCAV. In both 1 1 A and 1 IB lanes 1 to 
25 4 are: (1) mock; (2) gag opt HML-2; (3) gag opt PCAV; (4) gag wt PCAV. The upper arrow 
shows the position of gag; the lower arro^v shows the p-actin control. 

Figure 12 shows electron microscopy of 293 cells expressing (12A) gag opt PCAV or 
(12B)gag opt HML-2. 

MODES FOR CARRYING OUT THE INVENTION 

30 Certain aspects of the present invention are described in greater detail in the non-limiting 

examples that follow. The examples are put forth so as to provide those of ordinary skill in the 
art with a disclosure and description of how to make and use the present invention, and are not 
intended to limit the scope of what the inventors regard as their invention nor are they intended 
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to represent that the experiments below are all and only experiments performed. Efforts have 
been made to ensure accuracy with respect to numbers used {e.g. amounts, temperature, etc. ) but 
some experimental errors and deviations should be accounted for. Unless indicated otherwise, 
parts are parts by weight, molecular weight is weight average molecular weight, temperature is in 
5 degrees Celsius, and pressure is at or near atmospheric. 

Vectors for expressing HML-2 volvvevtides 

The basic pCMVkm2 vector is shown in figure 1. This vector has an immediate-early 
CMV enhancer/promoter and a bovine growth hormone transcription terminator, with a multiple 
cloning site in between. The vector also has a kanamycin resistance gene and a ColEl origin of 
10 replication. 

Sequences coding for HML-2 polypeptides being inserted between Sail and EcoKL in the 
multiple cloning site: 



Figure 


SEQID 


HML-2 polypeptide 


2 


51 


cORF 


3 


52 


PCAP5 


4 


53 


gag 


5 


54 


gag 


6 


55 


Pit 


7 


j 56 


Pol 



Except for the vector shown in figure 4 (SEQ ID 53), the inserted sequences were 
15 manipulated for codon preference, including addition of an optimal stop codon: 

cORF manipulation: 

Start with SEQ ID 57 (SEQ ID 43); manipulate to SEQ ID 58 (SEQ ID 67): 

ATGAACCCATCAGAGATGCAAAGAAAAGCACCTCCGCGGAGACGGAGACATC cORFwt_hml ( 1 ) 
ATGAACCCCAGCGAGATGCAGCGCAAGGCCCCCCCCCGCCGCCGCCGCCACC corf opt hml (1) 

20 

GCAATCGAGCACCGTTGACTCACAAGATGAACAAAATGGTGACGTCAGAAGA cORFwt_hml (53) 
GCAACCGCGCCCCCCTGACCCACAAGATGAACAAGATGGTGACCAGCGAGGA cor f opt_hml (53) 

ACAGATGAAGTTGCCATCCACCAAGAAGGCAGAGCCGCCAACTTGGGCACAA cORFwt_hml (105) 
25 GCAGATGAAGCTGCCCAGCACCAAGAAGGCCGAGCCCCCCACCTGGGCCCAG cor f opt_hml (105) 

CTAAAGAAGCTGACGCAGTTAGCTACAAAATATCTAGAGAACACAAAGGTGA cORFwt_hnU. (157 ) 
CTGAAGAAGCTGACCCAGCTGGCCACCAAGTACCTGGAGAACACCAAGGTGA cor f opt_hml (157) 

30 CACAAACCCCAGAGAGTATGCTGCTTGCAGCCTTGATGATTGTATCAATGGT cORFwt_hml (209) 

CCCAGACCCCCGAGAGCATGCTGCTGGCCGCCCTGATGATCGTGAGCATGGT corf opt__hral (2 09) 

GTCTGCAGGTGTACCCAACAGCTCCGAAGAGACAGCGACCATCGAGAACGGG cORFwt_hml (261) 
^ GAGCGCCGGCGTGCCCAACAGCAGCGAGGAGACCGCCACCATCGAGAACGGC corf opt_hml (2 61) 

CCA TGA cORFwtJiml (313) 

CCCGCTTAA cor f opt_hml (313) 
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PCAP5 manipulation: 

Start with SEQ ID 59 (SEQ ID 37); manipulate to SEQ ID 60 (SEQ ID 68): 

ATGAACCCATCGGAGATGCAAAGAAAAGCACCTCCGCGGAGACGGAGACAT pCAP5wt Jiml (1 ) 
^ ATGAACCCCAGCGAGATGCAGCGCAAGGCCCCCCCCCGCCGCCGCCGCCAC pcap5optJiml (1) 

CGCAATCGAGCACCGTTGACTCACAAGATGAACAAAATGGTGACGTCAGAA pCAPSwt Jiml ( 52 ) 
CGCAACCGCGCCCCCCTGACCCACAAGATGAACAAGATGGTGACCAGCGAG pcapSopt Jiml (52 ) 

GAACAGATGAAGTTGCCATCCACCAAGAAGGCAGAGCCGCCAACTTGGGCA pCAP5wt_hml (103 ) 
10 GAGCAGATGAAGCTGCCCAGCACCAAGAAGGCCGAGCCCCCCACCTGGGCC pcap5opt_hml (103) 

CAACTAAAGAAGCTGACGCAGTTAGCT ACAAAATATCTAGAGAACACAAAG pCAP5wt Jiml (154) 
CAGCTGAAGAAGCTGACCCAGCTGGCCACCAAGTACCTGGAGAACACCAAG pcap5opt_hml (154) 

1 5 GTGACACAAACCCCAGAGAGTATGCTGCTTGCAGCCTTGATGATTGTATCA pCAPSwt Jiml (205) 

GTGACCCAGACCCCCGAGAGCATGCTGCTGGCCGCCCTGATGATCGTGAGC pcap5optJiml (205) 

AT GGTGGT GTACC C AACAGC T CCGAAGAGACAGCGAC CAT CGAGAACG GGC pCAP5wt_hml (256) 
ATGGTGGTGTACCCCACCGCCCCCAAGCGCCAGCGCCCCAGCCGCACCGGC pcapSopt Jiml (256) 

20 

CATGAT G ACGAT GGC GG T TT T GT CGAAAAGAAAAGGGGGAAAT GT GGGG AA pCAPSwtJiml (307 ) 
CACGACGACGACGGCGGCTTCGTGGAGAAGAAGCGCGGCAAGTGCGGCGAG pcap5optJiml (307) 

AAGCAAGAGAGATCAGATTGTTACTGTGTCTGTGTAGAAAGAAGTAGACAT pCAP5wt_hml (358) 
25 AAGCAGGAGCGCAGCGACTGCTACTGCGTGTGCGTGGAGCGCAGCCGCCAC pcap5opt_hral (358) 

AGGAGACTCCATTTTGTTCTGTAC TAA pCAPSwt Jiml (409) 

CGCCGCCTGCACTTCGTGCTGTACGCTTAA pcap5opt Jiml (409) 

30 Gag manipulation: 

Start with SEQ ID 61 (SEQ ID 69); manipulate to SEQ ID 62 (SEQ ID 70): 

ATGGGGCAAACTAAAAGTAAAATTAAAAGTAAATATGCCTCTTATCTCAGCT gagwt_hml ( 1 ) 
ATGGGCCAGACCAAGAGCAAGATCAAGAGCAAGTACGCCAGCTACCTGAGCT gagopt_hml (1 ) 

35 TTATTAAAATTCTTTTAAAAAGAGGGGGAGTTAAAGTATCTACAAAAAATCT gagwt Jiml (53) 

TCATCAAGATCCTGCTGAAGCGCGGCGGCGTGAAGGTGAGCACCAAGAACCT gagopt Jiml ( 53 ) 

AATCAAGCTATTTCAAATAATAGAACAATTTTGCCCATGGTTTCCAGAACAA gagwt_hrnl (105) 
GATCAAGCTGTTCCAGATCATCGAGCAGTTCTGCCCCTGGTTCCCCGAGCAG gagopt_hml (105) 

40 

GGAACTTTAGATCTAAAAGATTGGAAAAGAATTGGTAAGGAACTAAAACAAG gagwt_hml ( 157 ) 
GGCACCCTGGACCTGAAGGACTGGAAGCGCATCGGCAAGGAGCTGAAGCAGG gagopt_hml (157 ) 

CAGGTAGGAAGGGTAATATCATTCCACTTACAGTATGGAATGATTGGGCCAT gagwt Jiml (209) 
45 CCGGCCGCAAGGGCAACATCATCCCCCTGACCGTGTGGAACGACTGGGCCAT gagop t_hml (209) 

TATTAAAGCAGCTTTAGAACCATTTCAAACAGAAGAAGATAGCGTTTCAGTT gagwt_hml (261) 
CATCAAGGCCGCCCTGGAGCCCTTCCAGACCGAGGAGGACAGCGTGAGCGTG gagopt_hml (261) 

50 TCTGATGCCCCTGGAAGCTGTATAATAGATTGTAATGAAAACACAAGGAAAA gagwt_hml (313) 

AGCGACGCCCCCGGCAGCTGCATCATCGACTGCAACGAGAACACCCGCAAGA gagopt Jiml (313) 

AATCCCAGAAAGAAACGGAAGGTTTACATTGCGAATATGTAGCAGAGCCGGT gagwt_hml (365) 
^ AGAGCCAGAAGGAGACCGAGGGCCTGCACTGCGAGTACGTGGCCGAGCCCGT gagopt_hml (365) 

AATGGCTCAGTCAACGCAAAATGTTGACTATAATCAATTACAGGAGGTGATA gagwt_hml (417) 
GATGGCCCAGAGCACCCAGAACGTGGACTACAACCAGCTGCAGGAGGTGATC gagopt Jiml (417) 

TATCCTGAAACGTTAAAATTAGAAGGAAAAGGTCCAGAATTAGTGGGGCCAT gagwt Jiml (469) 
60 TACCCCGAGACCCTGAAGCTGGAGGGCAAGGGCCCCGAGCTGGTGGGCCCCA gagopt Jiml (469) 

CAGAGTCTAAACCACGAGGCACAAGTCCTCTTCCAGCAGGTCAGGTGCCTGT gagwt Jiml (521) 
GCGAGAGCAAGCCCCGCGGCACCAGCCCCCTGCCCGCCGGCCAGGTGCCCGT gagopt Jiml (521 ) 

65 AACATTACAACCTCAAAAGCAGGTTAAAGAAAATAAGACCCAACCGCCAGTA gagwt Jiml (573) 
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GACCCTGCAGCCCCAGAAGCAGGTGAAGGAGAACAAGACCCAGCCCCCCGTG gagoptjiml ( 573) 

GCCTATCAATACTGGCCTCCGGCTGAACTTCAGTATCGGCCACCCCCAGAAA gagwt_hml ( 625) 
^ GCCTACCAGTACTGGCCCCCCGCCGAGCTGCAGTACCGCCCCCCCCCCGAGA gagopt_hml ( 625 ) 

GTCAGTATGGATATCCAGGAATGCCCCCAGCACCACAGGGCAGGGCGCCATA gagwtjiml ( 677 ) 
GCCAGTACGGCTACCCCGGCATGCCCCCCGCCCCCCAGGGCCGCGCCCCCTA gagopt_hml ( 677 ) 

CCCTCAGCCGCCCACTAGGAGACTTAATCCTACGGCACCACCTAGTAGACAG gagwt_hml (729) 
1 0 CCCCCAGCCCCCCACCCGCCGCCTGAACCCCACCGCCCCCCCCAGCCGCCAG gagopt_hml (729) 

GGTAGTAAATTACATGAAATTATTGATAAATCAAGAAAGGAAGGAGATACTG gagwt_hml (781) 
GGCAGCAAGCTGCACGAGATCATCGACAAGAGCCGCAAGGAGGGCGACACCG gagoptjiml (781) 

15 AGGCATGGCAATTCCCAGTAACGTTAGAACCGATGCCACCTGGAGAAGGAGC gagwt_hml (833) 

AGGCCTGGCAGTTCCCCGTGACCCTGGAGCCCATGCCCCCCGGCGAGGGCGC gagopt_hml ( 833 ) 

CCAAGAGGGAGAGCCTCCCACAGTTGAGGCCAGATACAAGTCTTTTTCGATA gagwt_hml (885) 
2^ CCAGGAGGGCGAGCCCCCCACCGTGGAGGCCCGCTACAAGAGCTTCAGCATC gagopt_hml ( 885 ) 

AAAAAGCTAAAAGATATGAAAGAGGGAGTAAAACAGTATGGACCCAACTCCC gagwt_hml (937) 
AAGAAGCTGAAGGACATGAAGGAGGGCGTGAAGCAGTACGGCCCCAACAGCC gagopt_hml ( 937 ) 

CTTATATGAGGACATTATTAGATTCCATTGCTCATGGACATAGACTCATTCC gagwt_hml (989) 
25 CCTACATGCGCACCCTGCTGGACAGCATCGCCCACGGCCACCGCCTGATCCC gagopt_hml (989) 

TTATGATTGGGAGATTCTGGCAAAATCGTCTCTCTCACCCTCTCAATTTTTA gagwt_hml (1041) 
CTACGACTGGGAGATCCTGGCCAAGAGCAGCCTGAGCCCCAGCCAGTTCCTG gagopt_hml (1041) 

30 CAATTTAAGACTTGGTGGATTGATGGGGTACAAGAACAGGTCCGAAGAAATA gagwt_hml (1093) 

CAGTTCAAGACCTGGTGGATCGACGGCGTGCAGGAGCAGGTGCGCCGCAACC gagopt_hml ( 1093 ) 

GGGCTGCCAATCCTCCAGTTAACATAGATGCAGATCAACTATTAGGAATAGG gagwtjiml (1145) 

GCGCCGCCAACCCCCCCGTGAACATCGACGCCGACCAGCTGCTGGGCATCGG gagopt hral (114 5) 

35 

TCAAAATTGGAGTACTATTAGTCAACAAGCATTAATGCAAAATGAGGCCATT gagwt_hml ( 1197 ) 

CCAGAACTGGAGCACCATCAGCCAGCAGGCCCTGATGCAGAACGAGGCCATC gagopt_hml ( 1197 ) 

GAGCAAGTTAGAGCTATCTGCCTTAGAGCCTGGGAAAAAATCCAAGACCCAG gagwt Jhnil (124 9) 
40 GAGCAGGTGCGCGCCATCTGCCTGCGCGCCTGGGAGAAGATCCAGGACCCCG gagopt_hral (124 9) 

GAAGTACCTGCCCCTCATTTAATACAGTAAGACAAGGTTCAAAAGAGCCCTA gagwt_hml ( 1301 ) 
GCAGCACCTGCCCCAGCTTCAACACCGTGCGCCAGGGCAGCAAGGAGCCCTA gagopt_hml ( 1301 ) 

45 TCCTGATTTTGTGGCAAGGCTCCAAGATGTTGCTCAAAAGTCAATTGCTGAT gagwt_hml (1353) 

CCCCGACTTCGTGGCCCGCCTGCAGGACGTGGCCCAGAAGAGCATCGCCGAC gagopt_hml ( 1353) 

GAAAAAGCCCGTAAGGTCATAGTGGAGTTGATGGCATATGAAAACGCCAATC gagwt_hml (14 05) 
5Q GAGAAGGCCCGCAAGGTGATCGTGGAGCTGATGGCCTACGAGAACGCCAACC gagopt_hml (1405) 

CTGAGTGTCAATCAGCCATTAAGCCATTAAAAGGAAAGGTTCCTGCAGGATC gagwtjhml (1457 ) 
CCGAGTGCCAGAGCGCCATCAAGCCCCTGAAGGGCAAGGTGCCCGCCGGCAG gagopt_hml (1457) 

AGATGTAATCTCAGAATATGTAAAAGCCTGTGATGGAATCGGAGGAGCTATG gagwt_hml (1509) 
55 CGACGTGATCAGCGAGTACGTGAAGGCCTGCGACGGCATCGGCGGCGCCATG gagopt_hml (1509) 

CAT AAAGCT AT G CTT AT GGCTCAAGCAAT AAC AGGAGT T GT T TT AGGAGGAC gagwt_hml (1561) 
CACAAGGCCATGCTGATGGCCCAGGCCATCACCGGCGTGGTGCTGGGCGGCC gagopt_hml (1561) 

60 AAGTTAGAACATTTGGAAGAAAATGTTATAATTGTGGTCAAATTGGTCACTT gagwt_hml (1613) 

AGGTGCGCACCTTCGGCCGCAAGTGCTACAACTGCGGCCAGATCGGCCACCT gagopt_hml (1613) 

AAAAAAGAATTGCCCAGTCTTAAATAAACAGAATATAACTATTCAAGCAACT gagwt_hml (1665) 

£ - GAAGAAGAACTGCCCCGTGCTGAACAAGCAGAACATCACCATCCAGGCCACC gagopt_hml (1665) 

65 

ACAACAGGTAGAGAGCCACCTGACTTATGTCCAAGATGTAAAAAAGGAAAAC gagwt_hml (1717 ). 

ACCACCGGCCGCGAGCCCCCCGACCTGTGCCCCCGCTGCAAGAAGGGCAAGC gagopt_hral ( 1717 ) 

ATTGGGCTAGTCAATGTCGTTCTAAATTTGATAAAAATGGGCAACCATTGTC gagwt_hml (17 69) 
70 ACTGGGCCAGCCAGTGCCGCAGCAAGTTCGACAAGAACGGCCAGCCCCTGAG gagopt_hral (17 69) 
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GGGAAACGAGCAAAGGGGCCAGCCTCAGGCCCCACAACAAACTGGGGCATTC gagwtjiml (1821) 
CGGCAACGAGCAGCGCGGCCAGCCCCAGGCCCCCCAGCAGACCGGCGCCTTC gagopt_hml { 1821 ) 

5 CCAATTCAGCCATTTGTTCCTCAGGGTTTTCAGGGACAACAACCCCCACTGT gagwt_hml (1873) 

CCCATCCAGCCCTTCGTGCCCCAGGGCTTCCAGGGCCAGCAGCCCCCCCTGA gagopt_hml (1873) 

CCCAAGTGTTTCAGGGAATAAGCCAGTTACCACAATACAACAATTGTCCCCC gagwt_hml (1925) 
^ GCCAGGTGTTCCAGGGCATCAGCCAGCTGCCCCAGTACAACAACTGCCCCCC gagopt_hml (1925) 

GCCACAAGCGGCAGTGCAGCAG TAG 

CCCCCAGGCCGCCGTGCAGCAGGCTTAA 

Prt manipulation: 

15 Start with SEQ ID 63 (SEQ ID 71); manipulate to SEQ ID 64 (SEQ ID 72): 

ATGTGGGCAACCATTGTCGGGAAACGAGCAAAGGGGCCAGCCTCAGGCCCCA Protwt_hml ( 1 ) 
ATGTGGGCCACCATCGTGGGCAAGCGCGCCAAGGGCCCCGCCAGCGGCCCCA protopt_hml (1 ) 

CAACAAACTGGGGCATTCCCAATTCAGCCATTTGTTCCTCAGGGTTTTCAGG Protwt_hml ( 53 ) 
10 CCACCAACTGGGGCATCCCC7^ACAGCGCCATCTGCAGCAGCGGCTTCAGCGG prot opt_hml ( 53 ) 

GACAACAACCCCCACTGTCCCAAGTGTTTCAGGGAATAAGCCAGTTACCACA Protwt_hml (105) 
CACCACCACCCCCACCGTGCCCAGCGTGAGCGGCAACAAGCCCGTGACCACC protopt_hml (105) 

15 ATACAACAATTGTCCCCCGCCACAAGCGGCAGTGCAGCAGTAGATTTATGTA Protwt_hml (157 ) 

ATCCAGCAGCTGAGCCCCGCCACCAGCGGCAGCGCCGCCGTGGACCTGTGCA protopt_hml ( 157 ) 

CTATACAAGCAGTCTCTCTGCTTCCAGGGGAGCCCCCACAAAAAACCCCCAC Protwt_hral (209) 
50 CCATCCAGGCCGTGAGCCTGCTGCCCGGCGAGCCCCCCCAGAAGACCCCCAC protopt_hml (209) 

AGGGGTATATGGACCCCTGCCTAAGGGGACTGTAGGACTAATCTTGGGACGA Pr otwt_hml (261) 
CGGCGTGTACGGCCCCCTGCCCAAGGGCACCGTGGGCCTGATCCTGGGCCGC prot opt_hml (261) 

' TCAAGTCTAAATCTAAAAGGAGTTCAAATTCATACTAGTGTGGTTGATTCAG Protwt_hml (313) 
35 AGCAGCCTGAACCTGAAGGGCGTGCAGATCCACACCAGCGTGGTGGACAGCG protopt_hml ( 313 ) 

ACTATAAAGGCGAAATTCAATTGGTTATTAGCTCTTCAATTCCTTGGAGTGC Protwt_hral (365) 
ACTACAAGGGCGAGATCCAGCTGGTGATCAGCAGCAGCATCCCCTGGAGCGC protopt_hml (365) 

♦0 CAGTCCAAGAGACAGGATTGCTCAATTATTACTCCTGCCATACATTAAGGGT Protwtjiml (417) 

CAGCCCCCGCGACCGCATCGCCCAGCTGCTGCTGCTGCCCTACATCAAGGGC protopt_hml (417) 

GGAAATAGTGAAATAAAAAGAATAGGAGGGCTTGGAAGCACTGATCCAACAG Pr otwt_hml (469) 
^ GGCAACAGCGAGATCAAGCGCATCGGCGGCCTGGGCAGCACCGACCCCACCG prot opt_hml (469) 

GAAAGGCTGCATATTGGGCAAGTCAGGTCTCAGAGAACAGACCTGTGTGTAA Protwtjiml ( 521 ) 
GCAAGGCCGCCTACTGGGCCAGCCAGGTGAGCGAGAACCGCCCCGTGTGCAA pr ot opt_hml (521) 

GGCCATTATTCAAGGAAAACAGTTTGAAGGGTTGGTAGACACTGGAGCAGAT Protwt Jml (573) 
50 GGCCATCATCCAGGGCAAGCAGTTCGAGGGCCTGGTGGACACCGGCGCCGAC protopt Jiml (573) 

GTCTCTATCATTGCTTTAAATCAGTGGCCAAAAAATTGGCCTAAACAAAAGG Protwt_hml ( 625 ) 
GTGAGCATCATCGCCCTGAACCAGTGGCCCAAGAACTGGCCCAAGCAGAAGG protopt Junl ( 625 ) 

55 CTGTTACAGGACTTGTCGGCATAGGCACAGCCTCAGAAGTGTATCAAAGTAC Protwtjiml ( 677 ) 

CCGTGACCGGCCTGGTGGGCATCGGCACCGCCAGCGAGGTGTACCAGAGCAC protopt_hml ( 677 ) 

GGAGATTTTACATTGCTTAGGGCCAGATAATCAAGAAAGTACTGTTCAGCCA Protwtjiml (729) 
£n CGAGATCCTGCACTGCCTGGGCCCCGACAACCAGGAGAGCACCGTGCAGCCC prot opt hml (72 9) 

ATGATTACTTCAATTCCTCTTAATCTGTGGGGTCGAGATTTATTACAACAAT Protwtjiml (781) 
ATGATCACCAGCATCCCCCTGAACCTGTGGGGCCGCGACCTGCTGCAGCAGT protopt_hml (781) 

GGGGTGCGGAAATCACCATGCCCGCTCCATCATATAGCCCCACGAGTCAAAA Protwtjiml (833) 
65 GGGGCGCCGAGATCACCATGCCCGCCCCCAGCTACAGCCCCACCAGCCAGAA prot opt Jiml (833) 

AATCATGACCAAGATGGGATATATACCAGGAAAGGGACTAGGGAAAAATGAA Protwtjiml (885) 



gagwt_hml (1977) 
gagoptjiml (1977) 
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GATCATGACCAAGATGGGCTACATCCCCGGCAAGGGCCTGGGCAAGAACGAG protopt_hml (885) 

GATGGCATTAAAATTCCAGTTGAGGCTAAAATAAATCAAGAAAGAGAAGGAA Pr otwtjiml (937) 
GACGGCATCAAGATCCCCGTGGAGGCCAAGATCAACCAGGAGCGCGAGGGCA' protopt_hml ( 937 ) 



Pol manipulation: 

Start with SEQ ID 65 (SEQ ID 73); manipulate to SEQ ID 66 (SEQ ID 74): 

ATGAATAAATCAAGAAAGAGAAGGAATAGGGAATCCTTGCTAGGGGCGGCCA polwt_hml (1) 
ATGAACAAGAGCCGCAAGCGCCGCAACCGCGAGAGCCTGCTGGGCGCCGCCA polopt_hml ( 1 ) 

CTGTAGAGCCTCCTAAACCCATACCATTAACTTGGAAAACAGAAAAACCAGT polwtjhml (53) 
CCGTGGAGCCCCCCAAGCCCATCCCCCTGACCTGGAAGACCGAGAAGCCCGT polopt_hml ( 53 ) ^ 

GTGGGTAAATCAGTGGCCGCTACCAAAACAAAAACTGGAGGCTTTACATTTA polwt_hml ( 105 ) 
GTGGGTGAACCAGTGGCCCCTGCCCAAGCAGAAGCTGGAGGCCCTGCACCTG polopt_hral ( 105 ) 

TTAGCAAATGAACAGTTAGAAAAGGGTCATATTGAGCCTTCGTTCTCACCTT polwt_hml (157 ) 
CTGGCCAACGAGCAGCTGGAGAAGGGCCACATCGAGCCCAGCTTCAGCCCCT polopt_hml (157 ) 

GGAATTCTCCTGTGTTTGTAATTCAGAAGAAATCAGGCAAATGGCGTATGTT polwt_hml (209) 
GGAACAGCCCCGTGTTCGTGATCCAGAAGAAGAGCGGCAAGTGGCGCATGCT polopt_hml (209) 

AACTGACTTAAGGGCTGTAAACGCCGTAATTCAACCCATGGGGCCTCTCCAA polwt_hml (261) 
GACCGACCTGCGCGCCGTGAACGCCGTGATCCAGCCCATGGGCCCCCTGCAG polopt_hml (261) 

CCCGGGTTGCCCTCTCCGGCCATGATCCCAAAAGATTGGCCTTTAATTATAA polwt_hml ( 313 ) 
CCCGGCCTGCCCAGCCCCGCCATGATCCCCAAGGACTGGCCCCTGATCATCA polopt_hml (313) 

TTGATCTAAAGGATTGCTTTTTTACCATCCCTCTGGCAGAGCAGGATTGCGA polwt_hml (365) 
TCGACCTGAAGGACTGCTTCTTCACCATCCCCCTGGCCGAGCAGGACTGCGA polopt_hml (365) 

AAAATTTGCCTTTACTATACCAGCCATAAATAATAAAGAACCAGCCACCAGG polwt_hml (417 ) 
GAAGTTCGCCTTCACCATCCCCGCCATCAACAACAAGGAGCCCGCCACCCGC polopt_hml (417 ) 

TTTCAGTGGAAAGTGTTACCTCAGGGAATGCTTAATAGTCCAACTATTTGTC polwt_hml (469) 
TTCCAGTGGAAGGTGCTGCCCCAGGGCATGCTGAACAGCCCCACCATCTGCC polopt_hml (469) 

AGACTTTTGTAGGTCGAGCTCTTCAACCAGTTAGAGAAAAGTTTTCAGACTG polwt_hml ( 521 ) 
AGACCTTCGTGGGCCGCGCCCTGCAGCCCGTGCGCGAGAAGTTCAGCGACTG polopt_hml (521) 

TTATATTATTCATTGTATTGATGATATTTTATGTGCTGCAGAAACGAAAGAT polwt_hml (57 3 ) 
CTACATCATCCACTGCATCGACGACATCCTGTGCGCCGCCGAGACCAAGGAC polopt_hml (573) 

AAATTAATTGACTGTTATACATTTCTGCAAGCAGAGGTTGCCAATGCTGGAC polwt_hml ( 625 ) 
AAGCTGATCGACTGCTACACOTTCCTGCAGGCCGAGGTGGCCAACGCCGGCC polopt_hral (625) 

TGGCAATAGCATCTGATAAGATCCAAACCTCTACTCCTTTTCATTATTTAGG polwt_hml ( 677 ) 
TGGCCATCGCCAGCGACAAGATCCAGACCAGCACCCCCTTCCACTACCTGGG poloptjiml (677) 

GAT GCAGAT AGAAAATAGAAAAAT T AAGC CACAAAAAAT AGAAAT AAGAAAA polwt_hml (729) 
CATGCAGATCGAGAACCGCAAGATCAAGCCCCAGAAGATCGAGATCCGCAAG polopt Jhml (729) 

GACAC AT T AAAAAC ACT AAAT GAT T T TC AAAAAT T AC TAG GAG AT ATT AAT T polwt_hml (781) 
GACACCCTGAAGACCCTGAACGACTTCCAGAAGCTGCTGGGCGACATCAACT polopt_hml (781) 

GGATTCGGCCAACTCTAGGCATTCCTACTTATGCCATGTCAAATTTGTTCTC polwt_hml (833) 
GGATCCGCCCCACCCTGGGCATCCCCACCTACGCCATGAGCAACCTGTTCAG polopt_hml (833) 

TATCTTAAGAGGAGACTCAGACTTAAATAGTAAAAGAATGTTAACCCCAGAG polwt_hml (885) 
CATCCTGCGCGGCGACAGCGACCTGAACAGCAAGCGCATGCTGACCCCCGAG polopt_hml (885) 

GCAACAAAAGAAATTAAATTAGTGGAAGAAAAAATTCAGTCAGCGCAAATAA polwt_hml (937) 
GCCACCAAGGAGATCAAGCTGGTGGAGGAGAAGATCCAGAGCGCCCAGATCA polopt_hml (937) 



T AGGGAAT CCT TGC TAG 

TCGGCAACCCCTGCGCTTAA 



Protwt_hral (989) 
protopt_hml (989) 
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ATAGAATAGATCCCTTAGCCCCACTCCAACTTTTGATTTTTGCCACTGCACA pol wt_hml (989) 
ACCGCATCGACCCCCTGGCCCCCCTGCAGCTGCTGATCTTCGCCACCGCCCA polopt__hml (989) 

TTCTCCAACAGGCATCATTATTCAAAATACTGATCTTGTGGAGTGGTCATTC polwt^hml (1041) 
CAGCCCCACCGGCATCATCATCCAGAACACCGACCTGGTGGAGTGGAGCTTC polopt_hml (1041) 

CTTCCTCACAGTACAGTTAAGACTTTTACATTGTACTTGGATCAAATAGCTA polwt__hml ( 1093 ) 
CTGCCCCACAGCACCGTGAAGACCTTCACCCTGTACCTGGACCAGATCGCCA polopt_hml (1093) 

CAT TAAT CGGT CAGACAAGATTAC GAAT AATAAAAT TAT GT GGGAAT GACCC polwt_hml (114 5) 
CCCTGATCGGCCAGACCCGCCTGCGCATCATCAAGCTGTGCGGCAACGACCC polopt_hml (114 5) 

AGACAAAATAGTTGTCCCTTTAACCAAGGAACAAGTTAGACAAGCCTTTATC polwt_hml (1197) 
CGACAAGATCGTGGTGCCCCTGACCAAGGAGCAGGTGCGCCAGGCCTTCATC polopt_hml (1197) 

AATTCTGGTGCATGGAAGATTGGTCTTGCTAATTTTGTGGGAATTATT GATA pol wt_hral (124 9) 
AACAGCGGCGCCTGGAAGATCGGCCTGGCCAACTTCGTGGGCATCATCGACA polopt_hml (12 4 9) 

ATCATTACCCAAAAACAAAGATCTTCCAGTTCTTAAAATTGACTACTTGGAT polwt_hml ( 1301 ) 
ACCACTACCCCAAGACCAAGATCTTCCAGTTCCTGAAGCTGACCACCTGGAT poloptjhml (1301) 

TCTACCTAAAATTACCAGACGTGAACCTTTAGAAAATGCTCTAACAGTATTT polwt_hml (1353) 
CCTGCCCAAGATCACCCGCCGCGAGCCCCTGGAGAACGCCCTGACCGTGTTC polopt_hml (1353) 

ACTGATGGTTCCAGCAATGGAAAAGCAGCTTACACAGGACCGAAAGAACGAG polwt__hml (1405) 
ACCGACGGCAGCAGCAACGGCAAGGCCGCCTACACCGGCCCCAAGGAGCGCG polopt_hml (14 05) 

TAATCAAAACTCCATATCAATCGGCTCAAAGAGCAGAGTTGGTTGCAGTCAT polwt_hml (14 57)* 
TGATCAAGACCCCCTACCAGAGCGCCCAGCGCGCCGAGCTGGTGGCCGTGAT polopt_hral (14 57) 

TACAGTGTTACAAGATTTTGACCAACCTATCAATATTATATCAGATTCTGCA polwt_hml (1509) 
CACCGTGCTGCAGGACTTCGACCAGCCCATCAACATCATCAGCGACAGCGCC polopt_hml (1509) 

TATGTAGTACAGGCTACAAGGGATGTTGAGACAGCTCTAATTAAATATAGCA polwt_hral (1561) 
TACGTGGTGCAGGCCACCCGCGACGTGGAGACCGCCCTGATCAAGTACAGCA poloptjhml (1561) 

TGGATGATCAGTTAAACCAGCTATTCAATTTATTACAACAAACTGTAAGAAA polwt_hml (1613) 
TGGACGACCAGCTGAACCAGCTGTTCAACCTGCTGCAGCAGACCGTGCGCAA polopt_hml (1613) 

AAGAAATTTCCCATTTTATATTACACATATTCGAGCACACACTAATTTACCA polwt Jhml (16 65) 
GCGCAACTTCCCCTTCTACATCACCCACATCCGCGCCCACACCAACCTGCCC polopt Jtiml (16 65) 

GGGCCTTTGACTAAAGCAAATGAACAAGCTGACTTACTGGT-ATCATCTGCA polwt_hml ( 1717 ) 
GGCCCCCTGACCAAGGCCAACGAGCAGGCCGACCTGCTGGTGAGCAGC-GCC polopt_hml (1717) 

CTCATAAAAGCACAAGAACTTCATGCTTTGACTCATGTAAATGCAGCAGGAT polwt_hml (17 68) 
CTGATCAAGGCCCAGGAGCTGCACGCCCTGACCCACGTGAACGCCGCCGGCC poloptjiml (17 68) 

TAAAAAACAAATTTGATGTCACATGGAAACAGGCAAAAGATATTGTACAACA polwt_hml (1820) 
TGAAGAACAAGTTCGACGTGACCTGGAAGCAGGCCAAGGACATCGTGCAGCA polopt_hml (1820) 

TTGCACCCAGTGTCAAGTCTTACACCTGCCCACTCAAGAGGCAGGAGTTAAT polwt_hml (1872) 
CTGCACCCAGTGCGAGGTGCTGCACCTGCCCACCCAGGAGGCCGGCGTGAAC polopt_hml (1872) 

CCCAGAGGTCTGTGTCCTAATGCATTATGGCAAATGGATGTCACGCATGTAC polwt Jiml (1924) 
CCCCGCGGCCTGTGCCCCAACGCCCTGTGGCAGATGGACGTGACCCACGTGC polopt_hml (1924) 

CTTCATTTGGAAGATTATCATATGTTCACGTAACAGTTGATACTTATTCACA polwt_hml (1976) 
CCAGCTTCGGCCGCCTGAGCTACGTGCACGTGACCGTGGACACCTACAGCCA poloptjhml (1976) 

TTTCATATGGGCAACTTGCCAAACAGGAGAAAGTACTTCCCATGTTAAAAAA polwt_hml (2028) 
CTTCATCTGGGCCACCTGCCAGACCGGCGAGAGCACCAGCCACGTGAAGAAG polopt_hml ( 2028 ) 

CATTTATTGTCTTGTTTTGCTGTAATGGGAGTTCCAGAAAAAATCAAAACTG polwt_hml (2080) 
CACCTGCTGAGCTGCTTCGCCGTGATGGGCGTGCCCGAGAAGATCAAGACCG polopt_hml (2080) 

ACAATGGACCAGGATATTGTAGTAAAGCTTTCCAAAAATTCTTAAGTCAGTG polwt_hml (2132) 
ACAACGGCCCCGGCTACTGCAGCAAGGCCTTCCAGAAGTTCCTGAGCCAGTG polopt_hml ( 2132 ) 

G AAAAT T T C AC AT AC AAC AGG AAT TC CTT AT AAT T C C C AAGGAC AGGC CAT A polwt_hml (2184) 
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GAAGATCAGCCACACCACCGGCATCCCCTACAACAGCCAGGGCCAGGCCATC polopt_hml (2184) 

GTTGAAAGAACTAATAGAACACTCAAAACTCAATTAGTTAAACAAAAAGAAG polwt_hml (2236) 
GTGGAGCGCACCAACCGCACCCTGAAGACCCAGCTGGTGAAGCAGAAGGAGG polopt_hml (2236) 

GGGGAGACAGTAAGGAGTGTACCACTCCTCAGATGCAACTTAATCTAGCACT polwt_hml (2288) 
GCGGCGACAGCAAGGAGTGCACCACCCCCCAGATGCAGCTGAACCTGGCCCT polopt_hml (2288) 

CTATACTTTAAATTTTTTAAACATTTATAGAAATCAGACTACTACTTCTGCA polwt_hml (2340) 
10 GTACACCCTGAACTTCCTGAACATCTACCGCAACCAGACCACCACCAGCGCC polopt_hml (234 0) 

GAACAACATCTTACTGGTAAAAAGAACAGCCCACATGAAGGAAAACTAATTT polwt_hml (2392) 
GAGCAGCACCTGACCGGCAAGAAGAACAGCCCCCACGAGGGCAAGCTGATCT polopt_hml (2392) 

15 GGTGGAAAGATAATAAAAATAAGACATGGGAAATAGGGAAGGTGATAACGTG polwt_hml (244 4) 

GGTGGAAGGACAACAAGAACAAGACCTGGGAGATCGGCAAGGTGATCACCTG polopt_hml (2444) 

GGGGAGAGGTTTTGCTTGTGTTTCACCAGGAGAAAATCAGCTTCCTGTTTGG polwt_hml (24 96) 
GGGCCGCGGCTTCGCCTGCGTGAGCCCCGGCGAGAACCAGCTGCCCGTGTGG polopt_hml (2496) 

ATACCCACTAGACATTTGAAGTTCTACAATGAACCCATCAGAGATGCAAAGA pol wt_hml (2548) 
ATCCCCACCCGCCACCTGAAGTTCTACAACGAGCCCATCCGCGACGCCAAGA polopt_hml (254 8) 

AAAGCACCTCCGCGGAGACGGAGACATCGCAATCGAGCACCGTTGACTCACA polwt_hml (2 600) 
25 AGAGCACCAGCGCCGAGACCGAGACCAGCCAGAGCAGCACCGTGGACAGCCA polopt_hml (2600) 

AGATGAACAAAATGGTGACGTCAGAAGAACAGATGAAGTTGCCATCCACCAA polwt_hml (2 652) 
GGACGAGCAGAACGGCGACGTGCGCCGCACCGACGAGGTGGCCATCCACCAG polopt_hml (2 652) 

30 GAAGGCAGAGCCGCCAACTTGGGCACAACTAAAGAAGCTGACGCAGTTAGCT polwt_hml (2704) 

GAGGGCCGCGCCGCCAACCTGGGCACCACCAAGGAGGCCGACGCCGTGAGCT polopt_hml (2704 ) 



20 
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ACAAAATATCTAGAGAACACAAAGGTGACACAAACCCCAGAGAGTATGCTGC polwt_hml (2756) 
ACAAGATCAGCCGCGAGCACAAGGGCGACACCAACCCCCGCGAGTACGCCGC polopt__hml (2756) 

TTGCAGCCTTGATGATTGTATCAATGGTGGTAAGTCTCCCTATGCCTGCAGG polwt_hml ( 2808 ) 
CTGCAGCCTGGACGACTGCATCAACGGCGGCAAGAGCCCCTACGCCTGCCGC polopt_hml (2808) 

AGCAGCTGCAGC TAA polwt_hml (2860) 

40 AGCAGCTGCAGCGCTTAA polopt_hml (2860) 

Env manipulation: 

Start with SEQ ID 81 (SEQ ID 83); manipulate to SEQ ID 82: 

envwt_HML2 ATGAACCCAAGCGAGATGCAAAGAAAAGCACCTCCGCGGAGACGGAGACATCGCAATCGA 
45 envopt_HML2 ATGAACCCCAGCGAGATGCAGCGCAAGGCCCCCCCCCGCCGCCGCCGCCACCGCAACCGC 

envwt_HML2 GCACCGTTGACTCACAAGATGAACAAAATGGTGACGTCAGAAGAACAGATGAAGTTGCCA 
envopt_HML2 GCCCCCCTGACCCACAAGATGAACAAGATGGTGACCAGCGAGGAGCAGATGAAGCTGCCC 

50 envwt_HML2 TCCACCAAGAAGGCAGAGCCGCCAACTTGGGCACAACTAAAGAAGCTGACGCAGTTAGCT 

envopt_HML2 AGCACCAAGAAGGCCGAGCCCCCCACCTGGGCCCAGCTGAAGAAGCTGACCCAGCTGGCC 

envwt_HML2 ACAAAATATCTAGAGAACACAAAGGTGACACAAACCCCAGAGAGTATGCTGCTTGCAGCC 
envopt_HML2 ACCAAGTACCTGGAGAACACCAAGGTGACCCAGACCCCCGAGAGCATGCTGCTGGCCGCC 



55 



60 



envwt__HML2 TTGATGATTGTATCAATGGTGGTAAGTCTCCCTATGCCTGCAGGAGCAGCTGCAGCTAAC 

envopt_HML2 CTGATGATCGTGAGCATGGTGGTGAGCCTGCCCATGCCCGCCGGCGCCGCCGCCGCCAAC 

envwt_HML2 TATACCTACTGGGCCTATGTGCCTTTCCCGCCCTTAATTCGGGCAGTCACATGGATGGAT 

envopt_HML2 TACACCTACTGGGCCTACGTGCCCTTCCCCCCCCTGATCCGCGCCGTGACCTGGATGGAC 

envwt_HML2 AATCCTACAGAAGTATATGTTAATGATAGTGTATGGGTACCTGGCCCCATAGATGATCGC 

envopt_HML2 AACCCCACCGAGGTGTACGTGAACGACAGCGTGTGGGTGCCCGGCCCCATCGACGACCGC 
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envwt_HML2 TGCCCTGCCAAACCTGAGGAAGAAGGGATGATGATAAATATTTCCATTGGGTATCATTAT 

envopt_HML2 TGCCCCGCCAAGCCCGAGGAGGAGGGCATGATGATCAACATCAGCATCGGCTACCACTAC 

envwt_HML2 CCTCCTATTTGCCTAGGGAGAGCACCAGGATGTTTAATGCCTGCAGTCCAAAATTGGTTG 

envopt_HML2 CCCCCCATCTGCCTGGGCCGCGCCCCCGGCTGCCTGATGCCCGCCGTGCAGAACTGGCTG 

envwt_HML2 GTAGAAGTACCTACTGTCAGTCCCATCTGTAGATTCACTTATCACATGGTAAGCGGGATG 

envopt_HML2 GTGGAGGTGCCCACCGTGAGCCCCATCTGCCGCTTCACCTACCACATGGTGAGCGGCATG 

envwt_HML2 TCACTCAGGCCACGGGTAAATTATTTACAAGACTTTTCTTATCAAAGATCATTAAAATTT 

envopt_HML2 AGCCTGCGCCCCCGCGTGAACTACCTGCAGGACTTCAGCTACCAGCGCAGCCTGAAGTTC 

envwt_HML2 AGACGTAAAGGGAAACCTTGCCCCAAGGAAATTCCCAAAGAATCAAAAAATACAGAAGTT 

envopt_HML2 CGCCCCAAGGGCAAGCCCTGCCCCAAGGAGATCCCCAAGGAGAGCAAGAACACCGAGGTG 

envwt_HML2 TTAGTTTGGGAAGAATGTGTGGCCAATAGTGCGGTGATATTACAAAACAATGAATTCGGA 

envopt_HML2 CTGGTGTGGGAGGAGTGCGTGGCCAACAGCGCCGTGATCCTGCAGAACAACGAGTTCGGC 

envwt_HML2 ACTATTATAGATTGGGCACCTCGAGGTCAATTCTACCACAATTGCTCAGGACAAACTCAG 

envopt_HML2 ACCATCATCGACTGGGCCCCCCGCGGCCAGTTCTACCACAACTGCAGCGGCCAGACCCAG 

envwt_HML2 TCGTGTCCAAGTGCACAAGTGAGTCCAGCTGTTGATAGCGACTTAACAGAAAGTTTAGAC 

envopt_HML2 AGCTGCCCCAGCGCCCAGGTGAGCCCCGCCGTGGACAGCGACCTGACCGAGAGCCTGGAC 

envwt_HML2 AAACATAAGCATAAAAAATTGCAGTCTTTCTACCCTTGGGAATGGGGAGAAAAAGGAATC 

envopt_HML2 AAGCACAAGCACAAGAAGCTGCAGAGCTTCTACCCCTGGGAGTGGGGCGAGAAGGGCATC 

envwt_HML2 TCTACCCCAAGACCAAAAATAGTAAGTCCTGTTTCTGGTCCTGAACATCCAGAATTATGG 

envopt_HML2 AGCACCCCCCGCCCCAAGATCGTGAGCCCCGTGAGCGGCCCCGAGCACCCCGAGCTGTGG 

envwt_HML2 AGGCTTACTGTGGCTTCACACCACATTAGAATTTGGTCTGGAAATCAAACTTTAGAAACA 

envopt_HML2 CGCCTGACCGTGGCCAGCCACCACATCCGCATCTGGAGCGGCAACCAGACCCTGGAGACC 

envwt_HML2 AGAGATCGTAAGCCATTTTATACTATTGACCTGAATTCCAGTCTAACAGTTCCTTTACAA 

envopt_HML2 CGCGACCGCAAGCCCTTCTACACCATCGACCTGAACAGCAGCCTGACCGTGCCCCTGCAG 

envwt_HML2 AGTTGCGTAAAGCCCCCTTATATGCTAGTTGTAGGAAATATAGTTATTAAACCAGACTCC 

envopt_HML2 AGCTGCGTGAAGCCCCCCTACATGCTGGTGGTGGGCAACATCGTGATCAAGCCCGACAGC 

envwt_HML2 CAGACTATAACCTGTGAAAATTGTAGATTGCTTACTTGCATTGATTCAACTTTTAATTGG 

envopt_HML2 CAGACCATCACCTGCGAGAACTGCCGCCTGCTGACCTGCATCGACAGCACCTTCAACTGG 

envwt_HML2 CAACACCGTATTCTGCTGGTGAGAGCAAGAGAGGGCGTGTGGATCCCTGTGTCCATGGAC 

envopt_HML2 CAGCACCGCATCCTGGTGGTGCGCGCCCGCGAGGGCGTGTGGATCCCCGTGAGCATGGAC 

envwt_HML2 CGACCGTGGGAGGCCTCGCCATCCGTCCATATTTTGACTGAAGTATTAAAAGGTGTTTTA 

envopt_HML2 CGCCCCTGGGAGGCCAGCCCCAGCGTGCACATCCTGACCGAGGTGCTGAAGGGCGTGCTG 

envwt_HML2 AATAGATCCAAAAGATTCATTTTTACTTTAATTGCAGTGATTATGGGATTAATTGCAGTC 

envopt_HML2 AACCGCAGCAAGCGCTTCATCTTCACCCTGATCGCCGTGATCATGGGCCTGATCGCCGTG 

envwt_HML2 ACAGCTACGGCTGCTGTAGCAGGAGTTGCATTGCACTCTTCTGTTCAGTCAGTAAACTTT 

envopt_HML2 ACCGCCACCGCCGCCGTGGCCGGCGTGGCCCTGCACAGCAGCGTGCAGAGCGTGAACTTC 

envwt__HML2 GTT AATGAT T GGCAAAAAAAT T CT ACAAGAT T GT GGAATT CACAAT CTAGT AT T GAT CAA 

envopt_HML2 GTGAACGACTGGCAGAAGAACAGCACCCGCCTGTGGAACAGCCAGAGCAGCATCGACCAG 

envwt_HML2 AAATTGGCAAATCAAATTAATGATCTTAGACAAACTGTCATTTGGATGGGAGACAGACTC 

envopt_HML2 AAGCTGGCCAACCAGATCAACGACCTGCGCCAGACCGTGATCTGGATGGGCGACCGCCTG 

envwt HML2 ATGAGCTTAGAACATCGTTTCCAGTTACAATGTGACTGGAATACGTCAGATTTTTGTATT 
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envopt_HML2 ATGAGCCTGGAGCACCGCTTCCAGCTGCAGTGCGACTGGAACACCAGCGACTTCTGCATC 

envwt_HML2 ACACCCCAAATTTATAATGAGTCTGAGCATCACTGGGACATGGTTAGACGCCATCTACAG 

envopt_HML2 ACCCCCCAGATCTACAACGAGAGCGAGCACCACTGGGACATGGTGCGCCGCCACCTGCAG 

5 

envwt_HML2 GGAAGAGAAGATAATCTCACTTTAGACATTTCCAAATTAAAAGAACAAATTTTCGAAGCA 

envopt_HML2 GGCCGCGAGGACAACCTGACCCTGGACATCAGCAAGCTGAAGGAGCAGATCTTCGAGGCC 

envwt_HML2 TCAAAAGCCCATTTAAATTTGGTGCCAGGAACTGAGGCAATTGCAGGAGTTGCTGATGGC 

10 envopt_HML2 AGCAAGGCCCACCTGAACCTGGTGCCCGGCACCGAGGCCATCGCCGGCGTGGCCGACGGC 

envwt_HML2 CTCGCAAATCTTAACCCTGTCACTTGGGTTAAGACCATTGGAAGTACTACGATTATAAAT 

envopt_HML2 CTGGCCAACCTGAACCCCGTGACCTGGGTGAAGACCATCGGCAGCACCACCATCATCAAC 

15 envwt_HML2 CTCATATTAATCCTTGTGTGCCTGTTTTGTCTGTTGTTAGTCTGCAGGTGTACCCAACAG 

envopt_HML2 CTGATCCTGATCCTGGTGTGCCTGTTCTGCCTGCTGCTGGTGTGCCGCTGCACCCAGCAG 



20 



envwt_HML2 CTCCGAAGAGACAGCGACCATCGAGAACGGGCCATGATGACGATGGCGGTTTTGTCGAAA 

envopt_HML2 CTGCGCCGCGACAGCGACCACCGCGAGCGCGCCATGATGACCATGGCCGTGCTGAGCAAG 

envwt_HML2 AGAAAAGGGGGAAATGTGGGGAAAAGCAAGAGAGATCAGATTGTTACTGTGTCTGTGGCfcTAA 

envopt_HML2 CGCAAGGGCGGCAACGTGGGCAAGAGCAAGCGCGACCAGATCGTGACCGTGAGCGTGGGCtaa 



IN VITRO EXPRESSION OF GA G SEQUENCES 
25 Three different gag-encoding sequences were cloned into the pCMVKm2 vector: 

(1) gag opt HML-2 (SEQ ID 54, including SEQ ID 62 and encoding SEQ ID 70 - Fig. 5). 

(2) gag opt PCAV (SEQ ID 80, including SEQ ID 77 and encoding SEQ ID 79 - Fig. 8). 

(3) gag wt PCAV (SEQ ID 53, including SEQ ID 76 and encoding SEQ ID 78 - Fig. 4). 

The vectors were used to transfect 293 cells in duplicate in 6-well plates, using the 
30 poly amine reagent Transit™ LT-1 (PanVera Corp, Madison WI) plus 2 ^ig DNA. 

Cells were lysed after 48 hours and analyzed by western blot using pooled mouse antibody 
against HML2-gag as the primary antibody (1:400), and goat anti-mouse HRP as the secondary 
antibody (1:20000). Figure 10 shows that 'gag opt PCAV (lane 2) expressed much more 
efficiently than 'gag wt PCAV (lane 3). Lane 1 ('gag opt HML-2 9 ) is more strongly stained than 

35 lane 2 ( c gag opt PCAV), but this could be due to the fact that the primary antibody was raised 
against the homologous HML-2 protein, rather than reflecting a difference in expression 
efficiency. To address this question, antibodies were also raised against the PCAV product and 
were used for Western blotting. Figure 11A shows results using the anti-HML2 as the primary 
antibody (1:500), and Figure 11B shows the results with anti-PCAV (1:500). Each antibody 

40 stains the homologous protein more strongly than the heterologous protein. 

NUCLEIC ACID IMMUNIZATION 

Vectors of the invention are purified from bacteria and used to immunize mice. 
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T CELL RESPONSES TO PCA V GAG 

CB6F1 mice were intramuscularly immunized with pCMVKm2 vectors encoding PCAV 
gag (Figures 4 & 8) and induction of gag-specific CD4+ and CD8+ cells were measured. 

Mice received four injections of 50jj.g plasmid at week 0, 2, 4 and 6. These plasmids 
5 included the wild type gag sequence (SEQ ID 76). Mice were then split into two separate groups 
for further work. 

The first group of three mice received a further 50\ig of plasmid at 25 weeks, but this 
plasmid included the optimized gag sequence (SEQ ID 77). Eleven days later spleens were 
harvested and pooled and a single cell suspension was prepared for culture. Spleen cells (1 x 10 6 

10 per culture) were cultured overnight at 37°C in the absence ("unstimulated") or presence 
("stimulated") of 1 x 10 7 plaque-forming units (pfu) of a recombinant vaccinia which contains 
the PCAV gag sequence ("rW-gag", produced by homologous recombination of cloning vector 
pSCl 1 [1 16], followed by plaque purification of recombinant rVVgag). Duplicate stimulated and 
unstimulated cultures were prepared. The following day Brefeldin A was added to block 

15 cytokine secretion and cultures were continued for 2 hours. Cultures were then harvested and 
stained with fluorescently-labeled monoclonal antibodies for cell surface CD8 and intracellular 
gamma interferon (IFN-y). Stained samples were analyzed by flow cytometry and the fraction of 
CD8+ cells that stained positively for intracellular IFN-y was determined. Results were as 
follows: 



Culture condition 


Culture #1 


Culture #2 


Average 


Unstimulated 


0.10 


0.14 


0.12 


Stimulated 


1.51 


1.27 


1.39 


Difference 


1.27 



20 An average of 1.27% of the pooled splenic CD8+ cells synthesized IFN-y in response to 

stimulation with rW-gag. This demonstrates that the DNA immunization induced CD8+ T cells 
that specifically recognized and responded to PCAV gag. 

The second group of four mice received a further 50\xg of plasmid at 28 weeks, but this 
plasmid included the optimized gag sequence (SEQ ED 77). Twelve days later spleens were 
25 harvested. As a specificity control, a spleen was also obtained from a CB6F1 mouse that had 
been vaccinated with a pCMV-KM2 vector encoding HML2 env. 

Single cell suspensions from individual spleens were prepared for culture. Spleen cells 
(1 x 10 6 per culture) were cultured overnight at 37°C in the absence of stimulation or in the 
presence of 1 x 10 7 pfu rVV-gag. As a specificity control, additional cultures contained another 
30 recombinant vaccinia virus, rVV-HIVgpl60env.SF162 ("rVV-HIVenv" - contains full-length 
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env gene from SF162 isolate of HIV-1), which was not expected to cross-react with either gag or 
env from PCAV. 

Duplicate cultures were prepared for each condition. The following day Brefeldin A was 
added to block cytokine secretion and anti-CD28 antibody was added to co-stimulate CD4 T 
5 cells. Cultures were continued for 2 hours and then harvested and stained with fluorescently- 
labeled monoclonal antibodies for cell surface CD8 and CD4 and intracellular EFN-y. Stained 
samples were analyzed by flow cytometry and the fractions of CD8+CD4- and CD4+8- T cells 
that stained positively for intracellular IFN-y were determined. Results are shown in the 
following table, expressed as the % of stained cells in response to stimulation by either PCAV 
10 gag or HIV env during spleen culture, after subtraction of the average value seen with cells 
which were not stimulated during spleen culture: 



Spleen culture 
stimulation 


Vector administered at 28 weeks 


PCAV gag 


PCAV gag 


PCAV gag 


PCAV gag 


PCAV env 


CD8 | 


PCAV gag 


1.32 


1.88 


3.00 


2.09 


0.13 


HIV env 


0,04 


0.12 


-0.02 


0.23 


0.05 


_CD4 


PCAV gag 


0.26 


0.17 


0.40 


0.22 * 


-0.01 ■ 


HIV env 


0.01 


-0.02 


-0.03 


0.01 


-0.02 



For the 4 mice that had been vaccinated with a vector encoding PCAV gag, therefore, the 
rVV-gag vector stimulated 1.32% to 3.00% of CD 8+ T cells to produce IFN-y. However, there 
. were few CD8+ T cells (<0.23%) that responded to the irrelevant rVV-HIVgpl60env vector. The 
15 CD8+ T cell response is thus specific to PCAV gag. Furthermore, the control mouse that was 
immunized with PCAV env had very few CD8+ T cells (0.13%) which responded to the vaccinia 
stimulation. 

Similarly, vaccination with PCAV gag, but not with PCAV env, induced CD4+ T cells 
specific for PCAV gag (0.17% to 0.40%). 

20 DNA immunization with vectors encoding PCAV gag thus induces CD8+ and CD4+ T 

cells that specifically recognize and respond to the PCAV gag antigen. 

VIRUS-LIKE PARTICLES 

293 cells were fixed 48 hours after transient transfection with pCMV-gag, either from 
HML-2 or from PCAV, and inspected by electron microscopy (Figure 12). VLPs were produced 
25 in both cases, but these were mainly intracellular for PCAV and mainly secreted for HML-2. 
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The assembly of viable VLPs from PCAV and HML-2 indicates that the gag protein has 
retained its essential activity even though the endogenous virus is "dormant" and might thus be 
expected to be subject to mutational inactivation. 

5 The above description of preferred embodiments of the invention has been presented by 

way of illustration and example for purposes of clarity and understanding. It is not intended to be 
exhaustive or to limit the invention to the precise forms disclosed. It will be readily apparent to 
those of ordinary skill in the art in light of the teachings of this invention that many changes and 
modifications may be made thereto without departing from the spirit of the invention. It is 

10 intended that the scope of the invention be defined by the appended claims and their equivalents. 
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SEQUENCE LISTING INDEX 



SJLQ id 


DESCRIPTIOJN 


1-9 


Gag sequences 


1 A 1 A 

10-14 


Prt sequences 


1 C O 1 

lD-Zl 


Pol sequences 


no no 
ZZ-Zo 


Env sequences 




cujkj* sequences 


JZO / 


x^v^/vJr sequences 


jo-ju 


VltVli/^o von ante A _1V^ cpm i£*T"i c? 
OUHCC VaJldilLb -rV-lvl dCLJUCIIUES 


3 1 


P^lVl V JSJI1Z. CWIvT Up I XXLVLL^-Z' ^FlgUiC Z*) 


^O 
jZ 


pL/JYl V JnJIIZ .pi-//vr jopt olvu_,-z ^rigure 




nP\/T\n^m'7 rrarr \xrf- "Pf"' A A/ /T7i mirp 

pv^jvi v jsjnz.gag wi Jrv^/\v ^jrigure hj 


j4 


pcjvivjsjiiz.gagopt jimjl-z figure 


JJ 


pcivi Vj\mz.±^rotopt xxmul-z figure o) 


56 


pCMVJsJTiZ.Poiopt JtlJVLL^Z (ingure /) 


57-66 


Nucleotide sequences pre- and post-manipulation 


67 


Manipulated cUKx 


DO 


Manipulated rCAJrD 


oy oc /U 


Gag — pre- and post-manipulation 


T1 JPr TO 

/ 1 OC /Z 


Prt — pre- and post-manipulation 


7^ # 74 


Pr>1 i nrft- anH nn^t-maninnlatinn 

r Vl — — L/l W CU1U UUuL lllUlllU U1UU wll 


75 


PCAV, from the beginning of its first 5' LTR to the end of its fragmented 3' LTR 


76 & 77 


PCAV Gag nucleotide sequences — pre-and post manipulation 


78&79 


PCAV Gag amino acid sequences — pre-and post manipulation 


80 


pCMVKm2.gagopt PCAV (Figure 8) 


81 


Wild-type env from HML-2 


82 


Optimized env from HML-2 


83 


Amino acid sequence encoded by SEQ IDs 81 & 82 



NB: 

- SEQ IDs 1 to 9 disclosed in reference 1 as SEQ IDs 85, 91, 97, 102, 92, 98, 103, 104 & 146 
5 - SEQ IDs 10 to 14 disclosed in reference 1 as SEQ IDs 86, 99, 105, 106 & 147 

- SEQ IDs 15 to 21 disclosed in reference 1 as SEQ IDs 87, 93, 100, 107, 94, 108 & 148 

- SEQ IDs 22 to 28 disclosed in reference 1 as SEQ IDs 88, 95, 101, 107, 96, 108 & 149 

- SEQ IDs 29 to 3 1 disclosed in reference 1 as SEQ IDs 89, 90 & 109 

- SEQ IDs 32 to 37 disclosed in reference 1 as SEQ IDs 10, 1 1, 12, 7, 8 & 9 
10 - SEQ IDs 38 to 50 disclosed in reference 1 as SEQ IDs 28-37, 39, 41 & 43 

- SEQ ID 75 disclosed in reference 3 as SEQ ID 1 . 
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